Датафрейм Telecom Churn
Первые 10 строк
| State | Account length | Area code | International plan | Voice mail plan | Number vmail messages | Total day minutes | Total day calls | Total day charge | Total eve minutes | Total eve calls | Total eve charge | Total night minutes | Total night calls | Total night charge | Total intl minutes | Total intl calls | Total intl charge | Customer service calls | Churn | Total calls | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | KS | 128 | 415 | No | Yes | 25 | 265.1 | 110 | 45.07 | 197.4 | 99 | 16.78 | 244.7 | 91 | 11.01 | 10.0 | 3 | 2.70 | 1 | False | 300 |
| 1 | OH | 107 | 415 | No | Yes | 26 | 161.6 | 123 | 27.47 | 195.5 | 103 | 16.62 | 254.4 | 103 | 11.45 | 13.7 | 3 | 3.70 | 1 | False | 329 |
| 2 | NJ | 137 | 415 | No | No | 0 | 243.4 | 114 | 41.38 | 121.2 | 110 | 10.30 | 162.6 | 104 | 7.32 | 12.2 | 5 | 3.29 | 0 | False | 328 |
| 3 | OH | 84 | 408 | Yes | No | 0 | 299.4 | 71 | 50.90 | 61.9 | 88 | 5.26 | 196.9 | 89 | 8.86 | 6.6 | 7 | 1.78 | 2 | False | 248 |
| 4 | OK | 75 | 415 | Yes | No | 0 | 166.7 | 113 | 28.34 | 148.3 | 122 | 12.61 | 186.9 | 121 | 8.41 | 10.1 | 3 | 2.73 | 3 | False | 356 |
| 5 | AL | 118 | 510 | Yes | No | 0 | 223.4 | 98 | 37.98 | 220.6 | 101 | 18.75 | 203.9 | 118 | 9.18 | 6.3 | 6 | 1.70 | 0 | False | 317 |
| 6 | MA | 121 | 510 | No | Yes | 24 | 218.2 | 88 | 37.09 | 348.5 | 108 | 29.62 | 212.6 | 118 | 9.57 | 7.5 | 7 | 2.03 | 3 | False | 314 |
| 7 | MO | 147 | 415 | Yes | No | 0 | 157.0 | 79 | 26.69 | 103.1 | 94 | 8.76 | 211.8 | 96 | 9.53 | 7.1 | 6 | 1.92 | 0 | False | 269 |
| 8 | LA | 117 | 408 | No | No | 0 | 184.5 | 97 | 31.37 | 351.6 | 80 | 29.89 | 215.8 | 90 | 9.71 | 8.7 | 4 | 2.35 | 1 | False | 267 |
| 9 | WV | 141 | 415 | Yes | Yes | 37 | 258.6 | 84 | 43.96 | 222.0 | 111 | 18.87 | 326.4 | 97 | 14.69 | 11.2 | 5 | 3.02 | 0 | False | 292 |
Вводные данные
Графики
Анализ датафрейма
Модель, предсказывающая признак Churn методом случайного леса
Круговая диаграмма признака Churn
Гистограмма наличия международного плана среди ушедших клиентов
Гистограмма наличия международного плана среди оставшихся клиентов
Модели отсортированы по баллам F1, поскольку точность и полнота важны для оценки.
Перекрестная проверка производится с 5-кратным повторением.
| index | Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | TT(Sec) |
|---|---|---|---|---|---|---|---|---|---|
| lightgbm | Light Gradient Boosting Machine | 0.9017 | 0.8693 | 0.5609 | 0.6741 | 0.6087 | 0.5534 | 0.5585 | 0.978 |
| gbc | Gradient Boosting Classifier | 0.8761 | 0.8511 | 0.6632 | 0.5409 | 0.5953 | 0.5232 | 0.5273 | 4.866 |
| rf | Random Forest Classifier | 0.8867 | 0.855 | 0.4688 | 0.6163 | 0.5315 | 0.4684 | 0.4746 | 1.526 |
| lr | Logistic Regression | 0.8119 | 0.8338 | 0.729 | 0.4019 | 0.5165 | 0.4126 | 0.4416 | 0.072 |
| svm | SVM - Linear Kernel | 0.8119 | 0.0 | 0.7069 | 0.3985 | 0.5087 | 0.4041 | 0.4298 | 0.066 |
Bagged модель
| index | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|
| 0 | 0.8997 | 0.8406 | 0.6667 | 0.6207 | 0.6429 | 0.5846 | 0.5851 |
| 1 | 0.9223 | 0.8914 | 0.6545 | 0.75 | 0.699 | 0.6547 | 0.6567 |
| 2 | 0.8847 | 0.9076 | 0.5636 | 0.5849 | 0.5741 | 0.5074 | 0.5076 |
| 3 | 0.8847 | 0.8458 | 0.5455 | 0.5882 | 0.566 | 0.4997 | 0.5001 |
| 4 | 0.8869 | 0.8774 | 0.6481 | 0.5738 | 0.6087 | 0.5429 | 0.5443 |
| Mean | 0.8957 | 0.8726 | 0.6157 | 0.6235 | 0.6181 | 0.5579 | 0.5588 | SD | 0.0144 | 0.0259 | 0.0506 | 0.0651 | 0.0488 | 0.057 | 0.0575 |
Boosted модель
| index | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|
| 0 | 0.8922 | 0.8294 | 0.5926 | 0.6038 | 0.5981 | 0.5359 | 0.5359 |
| 1 | 0.9198 | 0.8953 | 0.6 | 0.7674 | 0.6735 | 0.6285 | 0.6347 |
| 2 | 0.8872 | 0.8937 | 0.5273 | 0.6042 | 0.5631 | 0.4987 | 0.5002 |
| 3 | 0.8922 | 0.8456 | 0.5091 | 0.6364 | 0.5657 | 0.505 | 0.5091 |
| 4 | 0.892 | 0.8874 | 0.6296 | 0.5965 | 0.6126 | 0.5499 | 0.5502 | Mean | 0.8967 | 0.8703 | 0.5717 | 0.6416 | 0.6026 | 0.5436 | 0.546 |
| SD | 0.0117 | 0.0274 | 0.0458 | 0.0644 | 0.0402 | 0.0465 | 0.0478 |
Blended модель
| index | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|
| 0 | 0.8972 | 0.8348 | 0.6296 | 0.6182 | 0.6239 | 0.5644 | 0.5644 |
| 1 | 0.9173 | 0.891 | 0.5818 | 0.7619 | 0.6598 | 0.6137 | 0.6209 |
| 2 | 0.8897 | 0.9023 | 0.4909 | 0.6279 | 0.551 | 0.4892 | 0.4941 |
| 3 | 0.8972 | 0.8477 | 0.5273 | 0.6591 | 0.5859 | 0.528 | 0.5323 |
| 4 | 0.892 | 0.8888 | 0.6111 | 0.6 | 0.6055 | 0.5429 | 0.543 |
| Mean | 0.8987 | 0.8729 | 0.5681 | 0.6534 | 0.6052 | 0.5476 | 0.5509 | SD | 0.0098 | 0.0266 | 0.0519 | 0.0575 | 0.0364 | 0.0411 | 0.0418 |
Best модель
| index | Parameters |
|---|---|
| algorithm | SAMME.R |
| base_estimator | LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=142, reg_alpha=0.0, reg_lambda=0.0, silent='warn', subsample=1.0, subsample_for_bin=200000, subsample_freq=0) |
| learning_rate | 1.0 |
| n_estimators | 10 |
| random_state | 142 |
ROC-кривая
Кривая точного отклика
Матрица путаницы
AdaBoostClassifier
Прогноз на тестовых данных
| index | Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC |
|---|---|---|---|---|---|---|---|---|
| 0 | Light Gradient Boosting Machine | 0.9245 | 0.9048 | 0.7054 | 0.7521 | 0.728 | 0.6842 | 0.6847 |
